#########################################################################################################################
# Title: Media Measurement Matters: Estimating the Persuasive Effects of Partisan Media with Survey and Behavioral Data #
# Authors: Chloe Wittenberg, Matthew A. Baum, Adam J. Berinsky, Justin de Benedictis-Kessner, and Teppei Yamamoto       #
# Corresponding Email: cwitten@mit.edu									          #
# This version: January 11, 2023										          #
#########################################################################################################################

To use these replication files, download and save all of the files in the same directory, in their original format. Make sure that the files listed under "Data Files" below are nested in the correct sub-folders (data and sensitivity). 

All analyses were conducted using R version 4.0.3 and RStudio version 2022.07.2 ("Spotted Wakerobin" for macOS).  
Platform: x86_64-apple-darwin17.0
Running under: macOS Big Sur 11.6.8  

##############
# Code Files #
##############

The replication archive contains the following R scripts:

- 0_web_preprocessing_NOT RUN: This script reads in raw web-tracking data (incl. URLs) and pre-processes these data for later use in the 1_web_data_prep.R file. In this script, we use a dictionary-based classifier to identify URLs that are likely to correspond to "hard news," recode news domain values to match our external measures of domain slant, and flag sequential duplicates. The output of this script is an RDS/CSV file entitled "web_raw."

NOTE: Although we provide the code used to complete these steps, we cannot provide the underlying web-tracking data, given concerns about the URLs containing personally identifiable information that might compromise respondent privacy and confidentiality. We provide a copy of the web_raw CSV/RDS in the data folder for use in subsequent analyses.
- 1_web_data_prep: This script reads in data from web_raw.rds (created in the previous script) and performs additional cleaning of the web-tracking data (incl. removal of sequential duplicates and matching of the comScore data to external measures of domain slant). Measures of domain slant are pulled in from "bakshy_top500.csv" (Bakshy et al. 2015) and "eady_media_scores.csv" (Eady et al. 2019). The output of this script is an RDS/CSV file entitled "web_data."
- 2_survey_cleaning: This script reads in data from web_data.rds (cleaned web-tracking data); survey_data.csv (survey responses); and news_consump.csv (observed news consumption), cleans and recodes the survey data, and constructs respondent-level measures of relative volume and slant (both with and without hard news domains). The output of this script is a cleaned survey file (CSV/RDS) entitled "survey_data_cleaned" and a version of the web data with demographic information appended ("web_use.rds"). It also produces the tables contained in Appendix A.
- 3_main_analysis: This script uses data from web_use.rds and survey_data_cleaned.rds to generate all of the results presented in the main manuscript (Figures 1-5).
- 4_appendix_descriptives: This script uses data from web_use.rds and survey_data_cleaned.rds to create the descriptive analyses reported in Appendices E-H in the online supplement.
- 5_appendix_experimental: This script uses data from web_use.rds and survey_data_cleaned.rds to produce the experimental analyses reported in Appendices I-K in the online supplement.
- 6_appendix_hard_news: This script uses data from web_use.rds and survey_data_cleaned.rds to generate the results in Appendix L, which recreates both the descriptive and experimental results when estimating relative volume/slant using just URLs that are predicted to correspond to "hard news" stories. This script also includes steps for validating the "hard news" classifier, based on comparing a set of crowdsourced ratings from MTurk (url_mturk.csv) to a sample of URLs tagged by the classifier (url_ratings.csv).
- 7_appendix_alternative_binning: This script uses data from survey_data_cleaned.rds to create the results shown in Appendix M. This section re-estimates the persuasion results using several coding strategies, including continuous measures of relative volume/slant, as well as alternative binning strategies (e.g., quartiles, quintiles).
- 8_appendix_portals: This script uses data from web_use.rds and survey_data_cleaned.rds to reproduce all results contained in Appendices N-O in the online supplement. Appendix N presents descriptive and experimental results when including portal sites (msn.com, aol.com), and Appendix O presents these same results when excluding both portal sites and visits to Yahoo! News domains.
- 9_appendix_alt_scores: This script uses data from web_use.rds and survey_data_cleaned.rds to generate the figures and tables presented in Appendix P. This section replicates the descriptive and experimental results shown in the main manuscript using an alternative measure of domain slant, based on ratings from Eady et al. (2019) instead of Bakshy et al. (2015).
- 10_appendix_weights: This script uses data from web_use.rds and survey_data_cleaned.rds to recreate the figures and tables in Appendix Q, which incorporate sampling weights into both the descriptive and experimental results.
- 11_appendix_finance: This script uses data from web_use.rds and survey_data_cleaned.rds to assess the sensitivity of our results to the inclusion of financial domains, using simulated data to produce all of the figures and tables presented in Appendix S. Caution: this script takes some time to run!

- 12_appendix_sensitivity: This script uses data from survey_data_cleaned.rds to perform the sensitivity analyses summarized in Appendix R, based on guidance from Knox et al. (2019). Caution: this script takes some time to run! 

For convenience, the sensitivity folder contains two .RData files (results_actions_index_use and results_charter_index_use) that can be used to bypass the computationally intensive sampling procedure.

- helper_functions: set of helper functions used to generate plots and calculate overlapping coefficients across scripts

- bounds_frechet_function: [within sensitivity folder] function used to calculate the sensitivity bounds in 12_appendix_sensitivity.R and shown in Appendix R (based on Knox et al. 2019).


##############
# Data Files #
##############

The data folder contains the following files, arranged in the order of their use in the preceding R scripts:

- web_raw.csv/web_raw.rds: raw web-tracking data, with URLs removed, sequential duplicates flagged, and "hard news" classifier applied

- bakshy_top500.csv: list of domain-level "alignment scores" for top 500 domains in Bakshy et al. (2015)

- manual_coding_bma.csv: manual recoding of comScore domains to identify matches to Bakshy et al. (2015) scores

- eady_media_scores.csv: list of domain-level "alignment scores", via Eady et al. (2019)

- manual_coding_eady.csv: manual recoding of comScore domains to identify matches to Eady et al. (2019) scores

- survey_data.csv: raw survey responses needed to replicate the results in the main analyses

- news_consump.csv: data supplied from comScore about each respondents' total site visits in the pre-study period

- url_ratings.csv: classifier output for a sample of 2000 URLs from our web-tracking data

- url_mturk.csv: MTurk ratings of the sample of 2000 URLs (five ratings per URL)

- weights.csv: raked survey weights for each respondent (also calculated in 10_appendix_weights.R)

- guess_sites2016.csv: site visits to news outlets in Guess (2021); used to set assumptions in Appendix S (11_appendix_finance.R) about the frequency of visits to wsj.com


The sensitivity folder contains the following files (in addition to bounds_frechet_function.R, described above):
- results_actions_index_use.RData: sensitivity bounds calculated for the sharing index

- results_charter_index_use.RData: sensitivity bounds calculated for the attitudinal index


###############
# Other Files #
###############

- README.txt: Readme file containing a description of all files

- survey_codebook.xslx: Excel codebook describing each variable included in the raw survey data 

- Media_Measurement_Matters_CondAccept.pdf: PDF version of the conditionally accepted manuscript and online appendix


###################
# Data References #
###################

Bakshy, Etyan, Solomon Messing, and Lada Adamic. 2019. “Replication Data for: Exposure to Ideologically Diverse News and Opinion on Facebook.” Harvard Dataverse. https://doi.org/10.7910/DVN/AAI7VA.

Eady, Gregory, Jonathan Nagler, Andy Guess, Jan Zilinsky, and Joshua A. Tucker. 2019. “How Many People Live in Political Bubbles on Social Media? Evidence From Linked Survey and Twitter Data.” SAGE Open 9(1): 2158244019832705. https://doi.org/10.1177/2158244019832705.

Guess, Andrew M. 2020. “Replication Data for: (Almost) Everything in Moderation: New Evidence on Americans’ Online Media Diets.” Harvard Dataverse. https://doi.org/10.7910/DVN/ZFE3NE.

Knox, Dean, Teppei Yamamoto, Matthew A. Baum, and Adam J. Berinsky. 2019. “Design, Identification, and Sensitivity Analysis for Patient Preference Trials.” Journal of the American Statistical Association 114 (528): 1532–46. https://doi.org/10.1080/01621459.2019.1585248.
